[ET-VK] linear_qta8a_qga4w graph pass #13000

pytorchbot · 2025-07-30T16:15:17Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #12574 by @ahmtox
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/ahmtox/42/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/ahmtox/42/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/ahmtox/41/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/ahmtox/42/orig
@diff-train-skip-merge

Pull Request resolved: #12574 # Changes * Introduce `linear_qta8a_qga4w` custom operator in `custom_ops_lib.py` to handle dynamic activation + grouped weight quantized linear operations * Add pattern matching and fusion logic in `FuseQuantizedOpsTransform` to detect and replace dequant + dequant + linear sequences with the new fused operator * Implement comprehensive test coverage in `test_vulkan_passes.py` for the QTA8A_QGA4W fusion pattern validation * Add 4-bit weight packing utilities and grouped quantization support for efficient memory usage # Motivation The existing quantization workflow in Vulkan backend processes dynamic activation + grouped weight quantized linear operations as separate quantize/dequantize/linear steps, which creates performance overhead through: * Multiple kernel dispatches instead of a single fused operation * Intermediate tensor allocations for dequantized weights and activations * Suboptimal memory bandwidth utilization The new `linear_qta8a_qga4w` operator fuses the entire sequence into a single operation that: * Directly processes 8-bit quantized activations with per-token scales/zero-points * Handles 4-bit grouped quantized weights with configurable group sizes * Eliminates intermediate dequantization steps by performing dequantization inline * Reduces memory footprint through packed 4-bit weight storage This aligns with the broader goal of optimizing quantized model inference in the Vulkan backend by leveraging graph-level transformations to improve computational efficiency while maintaining numerical accuracy. ghstack-source-id: 299473616 @exported-using-ghexport Differential Revision: [D78291269](https://our.internmc.facebook.com/intern/diff/D78291269/)

pytorch-bot · 2025-07-30T16:15:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13000

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ahmtox

…up (#13001) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #12575 by @ahmtox ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/ahmtox/43/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/ahmtox/43/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/ahmtox/42/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/ahmtox/43/orig @diff-train-skip-merge cc @SS-JIA @manuelcandales @cbilgin --------- Co-authored-by: morelos <morelos@devvm4573.ash0.facebook.com> Co-authored-by: ahmtox <69552192+ahmtox@users.noreply.github.com>

pytorchbot requested a review from SS-JIA as a code owner July 30, 2025 16:15

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 30, 2025

ahmtox and others added 2 commits July 30, 2025 09:17

Merge branch 'gh/ahmtox/41/orig' into gh/ahmtox/42/orig

b70f056

Gasoonjia approved these changes Jul 30, 2025

View reviewed changes

Gasoonjia merged commit b7b0604 into gh/ahmtox/41/orig Jul 30, 2025
21 of 22 checks passed

Gasoonjia deleted the gh/ahmtox/42/orig branch July 30, 2025 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] linear_qta8a_qga4w graph pass #13000

[ET-VK] linear_qta8a_qga4w graph pass #13000

Uh oh!

pytorchbot commented Jul 30, 2025

Uh oh!

pytorch-bot bot commented Jul 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ET-VK] linear_qta8a_qga4w graph pass #13000

[ET-VK] linear_qta8a_qga4w graph pass #13000

Uh oh!

Conversation

pytorchbot commented Jul 30, 2025

Uh oh!

pytorch-bot bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13000

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Jul 30, 2025 •

edited

Loading